Multiple Similarity Measures and Source-Pair Information in Story Link Detection
نویسندگان
چکیده
State-of-the-art story link detection systems, that is, systems that determine whether two stories are about the same event or linked, are usually based on the cosine-similarity measured between two stories. This paper presents a method for improving the performance of a link detection system by using a variety of similarity measures and using source-pair specific statistical information. The utility of a number of different similarity measures, including cosine, Hellinger, Tanimoto, and clarity, both alone and in combination, was investigated. We also compared several machine learning techniques for combining the different types of information. The techniques investigated were SVMs, voting, and decision trees, each of which makes use of similarity and statistical information differently. Our experimental results indicate that the combination of similarity measures and source-pair specific statistical information using an SVM provides the largest improvement in estimating whether two stories are linked; the resulting system was the bestperforming link detection system at TDT-2002.
منابع مشابه
Description of Ntu Approach to Link Detection Task in Tdt2001
We participated in the link detection task and submitted four runs, including both manual and ASR transcription for audio resources; and both English translation and original Chinese character source stream for Mandarin sources. This paper will propose a method to tell if a pair of news stories discusses the same topic. Several issues are addressed, e.g., how to represent a news story, how to m...
متن کاملOptimizing Story Link Detection is not Equivalent to Optimizing New Event Detection
Link detection has been regarded as a core technology for the Topic Detection and Tracking tasks of new event detection. In this paper we formulate story link detection and new event detection as information retrieval task and hypothesize on the impact of precision and recall on both systems. Motivated by these arguments, we introduce a number of new performance enhancing techniques including p...
متن کاملApplying Dynamic Co-occurrence in Story Link Detection
Story link detection is part of a broader initiative called Topic Detection and Tracking, which is defined to be the task of determining whether two stories, such as news articles or radio broadcasts, are about the same event, or linked. In order to mine more information from the contents of the stories being compared and achieve a more high-powered system, motivated by the idea of the word co-...
متن کاملStory Link Detection based on Dynamic Information Extending
Topic Detection and Tracking refers to automatic techniques for locating topically related materials in streams of data. As the core technology of it, story link detection is to determine whether two stories are about the same topic. To overcome the limitation of the story length and the topic dynamic evolution problem in data streams, this paper presents a method of applying dynamic informatio...
متن کاملIntuitionistic Fuzzy Information Measures with Application in Rating of Township Development
Predominantly in the faltering atmosphere, the precise value of some factors is difficult to measure. Though, it can be easily approximated by intuitionistic fuzzy linguistic term in the real-life world problem. To deal with such situations, in this paper two information measures based on trigonometric function for intuitionistic fuzzy sets, which are a generalized version of the fuzzy informat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004